Discriminative Linear Transforms for Speaker Adaptation
نویسندگان
چکیده
Linear transform adaptation techniques such as Maximum Likelihood Linear Regression (MLLR) are a popular and effective family of methods for speaker adaptation. MLLR estimates transform parameters for Gaussian means and variances using a maximum likelihood (ML) objective function. This paper discusses the use of an alternative discriminative objective function for linear transform estimation, which is an interpolation of the maximum mutual information (MMI) objective function and the ML criterion. This Discriminative Linear Transform (DLT) more directly reduces the word error rate of the adaptation data than MLLR and assuming good generalisation will also reduce test-set error rates. The implementation of DLT estimation is discussed and test-data recognition results compared to those from standard unconstrained MLLR (mean and variance adaptation) using the 1994 WSJ/NAB spoke 3 non-native adaptation task. The results show that relative reductions in word error rate between 7% and 19% can be obtained by using DLTs.
منابع مشابه
Discriminative Fuzzy Clustering Maximum a Posterior Linear Regression for Speaker Adaptation
We propose a discriminative fuzzy clustering maximum a posterior linear regression (DFCMAPLR) model adaptation approach to compensate the acoustic mismatch due to speaker variability. The DFCMAPLR approach adopts the MAP criterion and a discriminative objective function to estimate shared affine transform and fuzzy weight sets, respectively. Then, through a linear combination of the calculated ...
متن کاملDiscriminative adaptation for log-linear acoustic models
Log-linear models have recently been used in acoustic modeling for speech recognition systems. This has been motivated by competitive results compared to systems based on Gaussian models, and a more direct parametrisation of the posterior model. To competitively use log-linear models for speech recognition, important methods, such as speaker adaptation, have to be reformulated in a log-linear f...
متن کاملComparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transform...
متن کاملAdaptive training using discriminative mapping transforms
Speaker adaptive training (SAT) is a useful technique for building speech recognition systems on non-homogeneous data. When combining SAT with discriminative training criteria, maximum likelihood (ML) transforms are often used for unsupervised adaptation tasks. This is because discriminatively estimated transforms are highly sensitive to errors in the supervision hypothesis. In this paper, spea...
متن کاملSpeaker Adaptation for Continuous Density HMMs: A Review
This paper reviews some popular speaker adaptation schemes that can be applied to continuous density hidden Markov models. These fall into three families based on MAP adaptation; linear transforms of model parameters such as maximum likelihood linear regression; and speaker clustering/speaker space methods such as eigenvoices. The strengths and weaknesses of each adaptation family are discussed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001